POSIX Regular Expression Parsing with Derivatives

نویسندگان

  • Martin Sulzmann
  • Kenny Zhuo Ming Lu
چکیده

We adapt the POSIX policy to the setting of regular expression parsing. POSIX favors longest left-most parse trees. Compared to other policies such as greedy left-most, the POSIX policy is more intuitive but much harder to implement. Almost all POSIX implementations are buggy as observed by Kuklewicz. We show how to obtain a POSIX algorithm for the general parsing problem based on Brzozowski’s regular expression derivatives. Correctness is fairly straightforward to establish and our benchmark results show that our approach is promising.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl)

Brzozowski introduced the notion of derivatives for regular expressions. They can be used for a very simple regular expression matching algorithm. Sulzmann and Lu cleverly extended this algorithm in order to deal with POSIX matching, which is the underlying disambiguation strategy for regular expressions needed in lexers. Sulzmann and Lu have made available on-line what they call a “rigorous pr...

متن کامل

Derivative-Based Diagnosis of Regular Expression Ambiguity

Regular expressions are often ambiguous. We present a novel method based on Brzozowski’s derivatives to aid the user in diagnosing ambiguous regular expressions. We introduce a derivative-based finite state transducer to generate parse trees and minimal counter-examples. The transducer can be easily customized to either follow the POSIX or Greedy disambiguation policy and based on a finite set ...

متن کامل

Recognising and Generating Terms using Derivatives of Parsing Expression Grammars

Grammar-based sentence generation has been thoroughly explored for Context-Free Grammars (CFGs), but remains unsolved for recognition-based approaches such as Parsing Expression Grammars (PEGs). Lacking tool support, language designers using PEGs have difficulty predicting the behaviour of their parsers. In this paper, we extend the idea of derivatives, originally formulated for regular express...

متن کامل

Certified Derivative-Based Parsing of Regular Expressions

We describe the formalization of a certified algorithm for regular expression parsing based on Brzozowski derivatives, in the dependently typed language Idris. The formalized algorithm produces a proof that an input string matches a given regular expression or a proof that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed w...

متن کامل

FIRE/J - optimizing regular expression searches with generative programming

Regular expressions are a powerful tool for analyzing and manipulating text. Their theoretical background lies within automata theory and formal languages. The FIRE/J (Fast Implementation of Regular Expressions for Java) regular expression library is designed to provide maximum execution speed, while remaining portable across different machine architectures. To achieve that, FIRE/J transforms e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014